Search CORE

126 research outputs found

Characteristics-Informed Neural Networks for Forward and Inverse Hyperbolic Problems

Author: Braga-Neto Ulisses
Publication venue
Publication date: 28/12/2022
Field of study

We propose characteristic-informed neural networks (CINN), a simple and efficient machine learning approach for solving forward and inverse problems involving hyperbolic PDEs. Like physics-informed neural networks (PINN), CINN is a meshless machine learning solver with universal approximation capabilities. Unlike PINN, which enforces a PDE softly via a multi-part loss function, CINN encodes the characteristics of the PDE in a general-purpose deep neural network trained with the usual MSE data-fitting regression loss and standard deep learning optimization methods. This leads to faster training and can avoid well-known pathologies of gradient descent optimization of multi-part PINN loss functions. If the characteristic ODEs can be solved exactly, which is true in important cases, the output of a CINN is an exact solution of the PDE, even at initialization, preventing the occurrence of non-physical outputs. Otherwise, the ODEs must be solved approximately, but the CINN is still trained only using a data-fitting loss function. The performance of CINN is assessed empirically in forward and inverse linear hyperbolic problems. These preliminary results indicate that CINN is able to improve on the accuracy of the baseline PINN, while being nearly twice as fast to train and avoiding non-physical solutions. Future extensions to hyperbolic PDE systems and nonlinear PDEs are also briefly discussed

arXiv.org e-Print Archive

Classification and Error Estimation for Discrete Data

Author: Braga-Neto Ulisses M
Publication venue: Bentham Science Publishers Ltd.
Publication date: 01/01/2009
Field of study

Discrete classification is common in Genomic Signal Processing applications, in particular in classification of discretized gene expression data, and in discrete gene expression prediction and the inference of boolean genomic regulatory networks. Once a discrete classifier is obtained from sample data, its performance must be evaluated through its classification error. In practice, error estimation methods must then be employed to obtain reliable estimates of the classification error based on the available data. Both classifier design and error estimation are complicated, in the case of Genomics, by the prevalence of small-sample data sets in such applications. This paper presents a broad review of the methodology of classification and error estimation for discrete data, in the context of Genomics, focusing on the study of performance in small sample scenarios, as well as asymptotic behavior

CiteSeerX

Crossref

PubMed Central

Rank discriminants for predicting phenotypes from RNA expression

Author: Afsari Bahman
Braga-Neto Ulisses M.
Geman Donald
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Texas A&M Repository

Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

Author: Braga-Neto Ulisses
Dougherty EdwardR
Sun Youting
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The Illusion of Distribution-Free Small-Sample Classification in Genomics

Author: Braga-Neto Ulisses M
Dougherty Edward R
Zollanvari Amin
Publication venue: Bentham Science Publishers Ltd
Publication date: 01/01/2011
Field of study

Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

BPDA - A Bayesian peptide detection algorithm for mass spectrometry

Author: Braga-Neto Ulisses
Dougherty Edward R
Sun Youting
Zhang Jianqiu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

PubMed Central

Texas A&M Repository